information bottleneck theory
- Asia > China > Guangdong Province > Guangzhou (0.05)
- South America > Paraguay > Asunción > Asunción (0.04)
- Media > Film (0.46)
- Leisure & Entertainment (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
- Information Technology > Data Science (0.68)
- Asia > China > Guangdong Province > Guangzhou (0.05)
- South America > Paraguay > Asunción > Asunción (0.04)
- Media > Film (0.46)
- Leisure & Entertainment (0.46)
Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality
Avoiding overfitting is a central challenge in machine learning, yet many large neural networks readily achieve zero training loss. Here we quantify overfitting via residual information, defined as the bits in fitted models that encode noise in training data. Information efficient learning algorithms minimize residual information while maximizing the relevant bits, which are predictive of the unknown generative models. We solve this optimization to obtain the information content of optimal algorithms for a linear regression problem and compare it to that of randomized ridge regression. Our results demonstrate the fundamental trade-off between residual and relevant information and characterize the relative information efficiency of randomized regression with respect to optimal algorithms.
Narrowing Information Bottleneck Theory for Multimodal Image-Text Representations Interpretability
Zhu, Zhiyu, Jin, Zhibo, Zhang, Jiayu, Yang, Nan, Huang, Jiahao, Zhou, Jianlong, Chen, Fang
The task of identifying multimodal image-text representations has garnered increasing attention, particularly with models such as CLIP (Contrastive Language-Image Pretraining), which demonstrate exceptional performance in learning complex associations between images and text. Despite these advancements, ensuring the interpretability of such models is paramount for their safe deployment in real-world applications, such as healthcare. While numerous interpretability methods have been developed for unimodal tasks, these approaches often fail to transfer effectively to multimodal contexts due to inherent differences in the representation structures. Bottleneck methods, well-established in information theory, have been applied to enhance CLIP's interpretability. However, they are often hindered by strong assumptions or intrinsic randomness. To overcome these challenges, we propose the Narrowing Information Bottleneck Theory, a novel framework that fundamentally redefines the traditional bottleneck approach. This theory is specifically designed to satisfy contemporary attribution axioms, providing a more robust and reliable solution for improving the interpretability of multimodal models. In our experiments, compared to state-of-the-art methods, our approach enhances image interpretability by an average of 9%, text interpretability by an average of 58.83%, and accelerates processing speed by 63.95%. Our code is publicly accessible at https://github.com/LMBTough/NIB . CLIP (Contrastive Language-Image Pretraining) has rapidly become a pivotal model in the field of multimodal learning, especially excelling in its ability to connect the visual and textual modalities (Lin et al., 2023).
Exploring Information Processing in Large Language Models: Insights from Information Bottleneck Theory
Yang, Zhou, Qi, Zhengyu, Ren, Zhaochun, Jia, Zhikai, Sun, Haizhou, Zhu, Xiaofei, Liao, Xiangwen
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks by understanding input information and predicting corresponding outputs. However, the internal mechanisms by which LLMs comprehend input and make effective predictions remain poorly understood. In this paper, we explore the working mechanism of LLMs in information processing from the perspective of Information Bottleneck Theory. We propose a non-training construction strategy to define a task space and identify the following key findings: (1) LLMs compress input information into specific task spaces (e.g., sentiment space, topic space) to facilitate task understanding; (2) they then extract and utilize relevant information from the task space at critical moments to generate accurate predictions. Based on these insights, we introduce two novel approaches: an Information Compression-based Context Learning (IC-ICL) and a Task-Space-guided Fine-Tuning (TS-FT). IC-ICL enhances reasoning performance and inference efficiency by compressing retrieved example information into the task space. TS-FT employs a space-guided loss to fine-tune LLMs, encouraging the learning of more effective compression and selection mechanisms. Experiments across multiple datasets validate the effectiveness of task space construction. Additionally, IC-ICL not only improves performance but also accelerates inference speed by over 40\%, while TS-FT achieves superior results with a minimal strategy adjustment.
- Asia > Singapore (0.14)
- Europe > Netherlands > South Holland > Leiden (0.04)
- Asia > China > Fujian Province > Fuzhou (0.04)
- (3 more...)
Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality
Avoiding overfitting is a central challenge in machine learning, yet many large neural networks readily achieve zero training loss. Here we quantify overfitting via residual information, defined as the bits in fitted models that encode noise in training data. Information efficient learning algorithms minimize residual information while maximizing the relevant bits, which are predictive of the unknown generative models. We solve this optimization to obtain the information content of optimal algorithms for a linear regression problem and compare it to that of randomized ridge regression. Our results demonstrate the fundamental trade-off between residual and relevant information and characterize the relative information efficiency of randomized regression with respect to optimal algorithms.
Efficient and Effective Deep Multi-view Subspace Clustering
Lin, Yuxiu, Liu, Hui, Wang, Ren, Guo, Qiang, Zhang, Caiming
Recent multi-view subspace clustering achieves impressive results utilizing deep networks, where the self-expressive correlation is typically modeled by a fully connected (FC) layer. However, they still suffer from two limitations. i) The parameter scale of the FC layer is quadratic to sample numbers, resulting in high time and memory costs that significantly degrade their feasibility in large-scale datasets. ii) It is under-explored to extract a unified representation that simultaneously satisfies minimal sufficiency and discriminability. To this end, we propose a novel deep framework, termed Efficient and Effective deep Multi-View Subspace Clustering (E$^2$MVSC). Instead of a parameterized FC layer, we design a Relation-Metric Net that decouples network parameter scale from sample numbers for greater computational efficiency. Most importantly, the proposed method devises a multi-type auto-encoder to explicitly decouple consistent, complementary, and superfluous information from every view, which is supervised by a soft clustering assignment similarity constraint. Following information bottleneck theory and the maximal coding rate reduction principle, a sufficient yet minimal unified representation can be obtained, as well as pursuing intra-cluster aggregation and inter-cluster separability within it. Extensive experiments show that E$^2$MVSC yields comparable results to existing methods and achieves state-of-the-art performance in various types of multi-view datasets.
Learning Sparse Neural Networks with Identity Layers
Ni, Mingjian, Chen, Guangyao, Zheng, Xiawu, Peng, Peixi, Yuan, Li, Tian, Yonghong
The sparsity of Deep Neural Networks is well investigated to maximize the performance and reduce the size of overparameterized networks as possible. Existing methods focus on pruning parameters in the training process by using thresholds and metrics. Meanwhile, feature similarity between different layers has not been discussed sufficiently before, which could be rigorously proved to be highly correlated to the network sparsity in this paper. Inspired by interlayer feature similarity in overparameterized models, we investigate the intrinsic link between network sparsity and interlayer feature similarity. Specifically, we prove that reducing interlayer feature similarity based on Centered Kernel Alignment (CKA) improves the sparsity of the network by using information bottleneck theory. Applying such theory, we propose a plug-and-play CKA-based Sparsity Regularization for sparse network training, dubbed CKA-SR, which utilizes CKA to reduce feature similarity between layers and increase network sparsity. In other words, layers of our sparse network tend to have their own identity compared to each other. Experimentally, we plug the proposed CKA-SR into the training process of sparse network training methods and find that CKA-SR consistently improves the performance of several State-Of-The-Art sparse training methods, especially at extremely high sparsity. Code is included in the supplementary materials.
Residual-based attention and connection to information bottleneck theory in PINNs
Anagnostopoulos, Sokratis J., Toscano, Juan Diego, Stergiopulos, Nikolaos, Karniadakis, George Em
Driven by the need for more efficient and seamless integration of physical models and data, physics-informed neural networks (PINNs) have seen a surge of interest in recent years. However, ensuring the reliability of their convergence and accuracy remains a challenge. In this work, we propose an efficient, gradient-less weighting scheme for PINNs, that accelerates the convergence of dynamic or static systems. This simple yet effective attention mechanism is a function of the evolving cumulative residuals and aims to make the optimizer aware of problematic regions at no extra computational cost or adversarial learning. We illustrate that this general method consistently achieves a relative $L^{2}$ error of the order of $10^{-5}$ using standard optimizers on typical benchmark cases of the literature. Furthermore, by investigating the evolution of weights during training, we identify two distinct learning phases reminiscent of the fitting and diffusion phases proposed by the information bottleneck (IB) theory. Subsequent gradient analysis supports this hypothesis by aligning the transition from high to low signal-to-noise ratio (SNR) with the transition from fitting to diffusion regimes of the adopted weights. This novel correlation between PINNs and IB theory could open future possibilities for understanding the underlying mechanisms behind the training and stability of PINNs and, more broadly, of neural operators.
- North America > United States > California > Yolo County > Davis (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Europe > Portugal > Braga > Braga (0.04)
Multi-view Semantic Consistency based Information Bottleneck for Clustering
Yan, Wenbiao, Zhu, Jihua, Zhou, Yiyang, Wang, Yifei, Zheng, Qinghai
Multi-view clustering can make use of multi-source information for unsupervised clustering. Most existing methods focus on learning a fused representation matrix, while ignoring the influence of private information and noise. To address this limitation, we introduce a novel Multi-view Semantic Consistency based Information Bottleneck for clustering (MSCIB). Specifically, MSCIB pursues semantic consistency to improve the learning process of information bottleneck for different views. It conducts the alignment operation of multiple views in the semantic space and jointly achieves the valuable consistent information of multi-view data. In this way, the learned semantic consistency from multi-view data can improve the information bottleneck to more exactly distinguish the consistent information and learn a unified feature representation with more discriminative consistent information for clustering. Experiments on various types of multi-view datasets show that MSCIB achieves state-of-the-art performance.
- North America > United States (0.15)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Asia > China > Fujian Province > Fuzhou (0.04)
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)